Wine Quality Reds Exploration by Jesús Sanz Sanz

This tidy data set contains 1,599 red wines with 11 variables on the chemical properties of the wine. At least 3 wine experts rated the quality of each wine, providing a rating between 0 (very bad) and 10 (very excellent).

Univariate Plots Section

The dataset is related to red variant of the Portuguese “Vinho Verde” wine.
For more details, consult: http://www.vinhoverde.pt/en/ or the reference [Cortez et al., 2009].
Due to privacy and logistic issues, only physicochemical (inputs) and sensory (the output) variables are available (e.g. there is no data about grape types, wine brand, wine selling price, etc.).

Atributes (based on physicochemical tests):
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3)
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)
Atribute (based on sensory data):
12 - quality (score between 0 and 10)

Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines
4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
5 - chlorides: the amount of salt in the wine
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 oncentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
11 - alcohol: the percent alcohol content of the wine
12 - quality (score between 0 and 10)

## 'data.frame':    1599 obs. of  13 variables:
##  $ X                   : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ fixed.acidity       : num  7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
##  $ volatile.acidity    : num  0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
##  $ citric.acid         : num  0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
##  $ residual.sugar      : num  1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
##  $ chlorides           : num  0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
##  $ free.sulfur.dioxide : num  11 25 15 17 11 13 15 15 9 17 ...
##  $ total.sulfur.dioxide: num  34 67 54 60 34 40 59 21 18 102 ...
##  $ density             : num  0.998 0.997 0.997 0.998 0.998 ...
##  $ pH                  : num  3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
##  $ sulphates           : num  0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
##  $ alcohol             : num  9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
##  $ quality             : int  5 5 5 6 5 5 5 7 7 5 ...
##        X          fixed.acidity   volatile.acidity  citric.acid   
##  Min.   :   1.0   Min.   : 4.60   Min.   :0.1200   Min.   :0.000  
##  1st Qu.: 400.5   1st Qu.: 7.10   1st Qu.:0.3900   1st Qu.:0.090  
##  Median : 800.0   Median : 7.90   Median :0.5200   Median :0.260  
##  Mean   : 800.0   Mean   : 8.32   Mean   :0.5278   Mean   :0.271  
##  3rd Qu.:1199.5   3rd Qu.: 9.20   3rd Qu.:0.6400   3rd Qu.:0.420  
##  Max.   :1599.0   Max.   :15.90   Max.   :1.5800   Max.   :1.000  
##  residual.sugar     chlorides       free.sulfur.dioxide
##  Min.   : 0.900   Min.   :0.01200   Min.   : 1.00      
##  1st Qu.: 1.900   1st Qu.:0.07000   1st Qu.: 7.00      
##  Median : 2.200   Median :0.07900   Median :14.00      
##  Mean   : 2.539   Mean   :0.08747   Mean   :15.87      
##  3rd Qu.: 2.600   3rd Qu.:0.09000   3rd Qu.:21.00      
##  Max.   :15.500   Max.   :0.61100   Max.   :72.00      
##  total.sulfur.dioxide    density             pH          sulphates     
##  Min.   :  6.00       Min.   :0.9901   Min.   :2.740   Min.   :0.3300  
##  1st Qu.: 22.00       1st Qu.:0.9956   1st Qu.:3.210   1st Qu.:0.5500  
##  Median : 38.00       Median :0.9968   Median :3.310   Median :0.6200  
##  Mean   : 46.47       Mean   :0.9967   Mean   :3.311   Mean   :0.6581  
##  3rd Qu.: 62.00       3rd Qu.:0.9978   3rd Qu.:3.400   3rd Qu.:0.7300  
##  Max.   :289.00       Max.   :1.0037   Max.   :4.010   Max.   :2.0000  
##     alcohol         quality     
##  Min.   : 8.40   Min.   :3.000  
##  1st Qu.: 9.50   1st Qu.:5.000  
##  Median :10.20   Median :6.000  
##  Mean   :10.42   Mean   :5.636  
##  3rd Qu.:11.10   3rd Qu.:6.000  
##  Max.   :14.90   Max.   :8.000

1. Fixed acidity (tartaric acid - g / dm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.60    7.10    7.90    8.32    9.20   15.90

2. Volatile acidity (acetic acid - g / dm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.1200  0.3900  0.5200  0.5278  0.6400  1.5800

We add to the data set the relationship between volatile and fixed acidity to analyze it later.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01348 0.04405 0.06569 0.06706 0.08581 0.20800

3. Citric acid (g / dm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.090   0.260   0.271   0.420   1.000
## 
##    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09  0.1 0.11 0.12 0.13 0.14 
##  132   33   50   30   29   20   24   22   33   30   35   15   27   18   21 
## 0.15 0.16 0.17 0.18 0.19  0.2 0.21 0.22 0.23 0.24 0.25 0.26 0.27 0.28 0.29 
##   19    9   16   22   21   25   33   27   25   51   27   38   20   19   21 
##  0.3 0.31 0.32 0.33 0.34 0.35 0.36 0.37 0.38 0.39  0.4 0.41 0.42 0.43 0.44 
##   30   30   32   25   24   13   20   19   14   28   29   16   29   15   23 
## 0.45 0.46 0.47 0.48 0.49  0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 
##   22   19   18   23   68   20   13   17   14   13   12    8    9    9    8 
##  0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69  0.7 0.71 0.72 0.73 0.74 
##    9    2    1   10    9    7   14    2   11    4    2    1    1    3    4 
## 0.75 0.76 0.78 0.79    1 
##    1    3    1    1    1

The tendency of citric acid is to decrease to 0.75 but three anomalies can be observed. Many wines with value 0 or practically 0, others many with 0.49 and an extreme value of 1 that could be an erroneous value or a wine that looks for a very fruity flavor.

We add to the data set the relationship between citric acid and fixed acidity to analyze it later.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.00000 0.01292 0.03291 0.03084 0.04503 0.13929

4. Residual sugar (g / dm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.900   1.900   2.200   2.539   2.600  15.500

Using a logarithmic axis we can observe a standard normal curve

5. Chlorides (sodium chloride - g / dm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.01200 0.07000 0.07900 0.08747 0.09000 0.61100
##       X fixed.acidity volatile.acidity citric.acid residual.sugar
## 837 837           6.7             0.28        0.28            2.4
## 838 838           6.7             0.28        0.28            2.4
##     chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 837     0.012                  36                  100 0.99064 3.26
## 838     0.012                  36                  100 0.99064 3.26
##     sulphates alcohol quality relative.volatile.acidity
## 837      0.39    11.7       7                0.04179104
## 838      0.39    11.7       7                0.04179104
##     relative.citric.acid
## 837           0.04179104
## 838           0.04179104

It seems that the use of salt is very centralized (a low typical deviation) but many samples with much higher values are observed. Maybe to counteract other flavors? Perhaps sugar?
There is also some sample with practically no value. Maybe it’s a mistake or an exceptional wine that seeks this characteristic.

6. Free sulfur dioxide (mg / dm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    7.00   14.00   15.87   21.00   72.00

7. Total sulfur dioxide (mg / dm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    6.00   22.00   38.00   46.47   62.00  289.00
##         X fixed.acidity volatile.acidity citric.acid residual.sugar
## 1080 1080           7.9              0.3        0.68            8.3
## 1082 1082           7.9              0.3        0.68            8.3
##      chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 1080      0.05                37.5                  278 0.99316 3.01
## 1082      0.05                37.5                  289 0.99316 3.01
##      sulphates alcohol quality relative.volatile.acidity
## 1080      0.51    12.3       7                0.03797468
## 1082      0.51    12.3       7                0.03797468
##      relative.citric.acid
## 1080           0.08607595
## 1082           0.08607595

Some extreme value can be observed. We might think that maybe they are low quality wines but no. They have a value of 7/10

We add to the data set the relationship between free and total sulfur dioxide to analyze it later.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## 0.02273 0.25926 0.37500 0.38231 0.48485 0.85714

8. Density (g / cm^3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.9901  0.9956  0.9968  0.9967  0.9978  1.0037

9. PH

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.740   3.210   3.310   3.311   3.400   4.010
##       X fixed.acidity volatile.acidity citric.acid residual.sugar
## 87   87           8.6             0.49        0.28            1.9
## 92   92           8.6             0.49        0.28            1.9
## 93   93           8.6             0.49        0.29            2.0
## 152 152           9.2             0.52        1.00            3.4
## 364 364          12.5             0.46        0.63            2.0
## 441 441          12.6             0.31        0.72            2.2
##     chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 87      0.110                  20                  136  0.9972 2.93
## 92      0.110                  20                  136  0.9972 2.93
## 93      0.110                  19                  133  0.9972 2.93
## 152     0.610                  32                   69  0.9996 2.74
## 364     0.071                   6                   15  0.9988 2.99
## 441     0.072                   6                   29  0.9987 2.88
##     sulphates alcohol quality relative.volatile.acidity
## 87       1.95     9.9       6                0.05697674
## 92       1.95     9.9       6                0.05697674
## 93       1.98     9.8       5                0.05697674
## 152      2.00     9.4       4                0.05652174
## 364      0.87    10.2       5                0.03680000
## 441      0.82     9.8       8                0.02460317
##     relative.citric.acid relative.sulfur.dioxide
## 87            0.03255814               0.1470588
## 92            0.03255814               0.1470588
## 93            0.03372093               0.1428571
## 152           0.10869565               0.4637681
## 364           0.05040000               0.4000000
## 441           0.05714286               0.2068966

We could think that wines with a pH out of 3-4 are not very good wines. We can observe that they have ratings between 4-6 and curiously there is one with very acid pH 2.88 that has a rating of 8!

10. Sulphates (potassium sulphate - g / dm3)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.3300  0.5500  0.6200  0.6581  0.7300  2.0000
##       X fixed.acidity volatile.acidity citric.acid residual.sugar
## 14   14           7.8            0.610        0.29            1.6
## 87   87           8.6            0.490        0.28            1.9
## 92   92           8.6            0.490        0.28            1.9
## 93   93           8.6            0.490        0.29            2.0
## 152 152           9.2            0.520        1.00            3.4
## 170 170           7.5            0.705        0.24            1.8
##     chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 14      0.114                   9                   29  0.9974 3.26
## 87      0.110                  20                  136  0.9972 2.93
## 92      0.110                  20                  136  0.9972 2.93
## 93      0.110                  19                  133  0.9972 2.93
## 152     0.610                  32                   69  0.9996 2.74
## 170     0.360                  15                   63  0.9964 3.00
##     sulphates alcohol quality relative.volatile.acidity
## 14       1.56     9.1       5                0.07820513
## 87       1.95     9.9       6                0.05697674
## 92       1.95     9.9       6                0.05697674
## 93       1.98     9.8       5                0.05697674
## 152      2.00     9.4       4                0.05652174
## 170      1.59     9.5       5                0.09400000
##     relative.citric.acid relative.sulfur.dioxide
## 14            0.03717949               0.3103448
## 87            0.03255814               0.1470588
## 92            0.03255814               0.1470588
## 93            0.03372093               0.1428571
## 152           0.10869565               0.4637681
## 170           0.03200000               0.2380952

We can observe that wines with many sulfates are very acidic. We will check further if there is a relationship between these two variables.

11. Alcohol (% by volume)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    8.40    9.50   10.20   10.42   11.10   14.90

12. Quality (score between 0 and 10)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   5.000   6.000   5.636   6.000   8.000
##    Low Medium   High 
##     63   1319    217

Most of the ratings are 5-6. Very few wines get extreme values of 3 or 8. I would expect to find some 9 or 10. Perhaps the Portuguese wines do not reach such a high quality or perhaps they have not entered into this study because of their price.

##       X fixed.acidity volatile.acidity citric.acid residual.sugar
## 268 268           7.9             0.35        0.46            3.6
## 279 279          10.3             0.32        0.45            6.4
## 391 391           5.6             0.85        0.05            1.4
## 441 441          12.6             0.31        0.72            2.2
## 456 456          11.3             0.62        0.67            5.2
## 482 482           9.4             0.30        0.56            2.8
##     chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 268     0.078                  15                   37  0.9973 3.35
## 279     0.073                   5                   13  0.9976 3.23
## 391     0.045                  12                   88  0.9924 3.56
## 441     0.072                   6                   29  0.9987 2.88
## 456     0.086                   6                   19  0.9988 3.22
## 482     0.080                   6                   17  0.9964 3.15
##     sulphates alcohol quality relative.volatile.acidity
## 268      0.86    12.8       8                0.04430380
## 279      0.82    12.6       8                0.03106796
## 391      0.82    12.9       8                0.15178571
## 441      0.82     9.8       8                0.02460317
## 456      0.69    13.4       8                0.05486726
## 482      0.92    11.7       8                0.03191489
##     relative.citric.acid relative.sulfur.dioxide quality.ranges
## 268          0.058227848               0.4054054           High
## 279          0.043689320               0.3846154           High
## 391          0.008928571               0.1363636           High
## 441          0.057142857               0.2068966           High
## 456          0.059292035               0.3157895           High
## 482          0.059574468               0.3529412           High

It seems that the wines with the best rating have a high degree of alcohol.

##         X fixed.acidity volatile.acidity citric.acid residual.sugar
## 460   460          11.6            0.580        0.66           2.20
## 518   518          10.4            0.610        0.49           2.10
## 691   691           7.4            1.185        0.00           4.25
## 833   833          10.4            0.440        0.42           1.50
## 900   900           8.3            1.020        0.02           3.40
## 1300 1300           7.6            1.580        0.00           2.10
##      chlorides free.sulfur.dioxide total.sulfur.dioxide density   pH
## 460      0.074                  10                   47 1.00080 3.25
## 518      0.200                   5                   16 0.99940 3.16
## 691      0.097                   5                   14 0.99660 3.63
## 833      0.145                  34                   48 0.99832 3.38
## 900      0.084                   6                   11 0.99892 3.48
## 1300     0.137                   5                    9 0.99476 3.50
##      sulphates alcohol quality relative.volatile.acidity
## 460       0.57     9.0       3                0.05000000
## 518       0.63     8.4       3                0.05865385
## 691       0.54    10.7       3                0.16013514
## 833       0.86     9.9       3                0.04230769
## 900       0.49    11.0       3                0.12289157
## 1300      0.40    10.9       3                0.20789474
##      relative.citric.acid relative.sulfur.dioxide quality.ranges
## 460           0.056896552               0.2127660            Low
## 518           0.047115385               0.3125000            Low
## 691           0.000000000               0.3571429            Low
## 833           0.040384615               0.7083333            Low
## 900           0.002409639               0.5454545            Low
## 1300          0.000000000               0.5555556            Low

At first glance, there is no generality in its attributes that can be related to a low score in valuation.

Univariate Analysis

What is the structure of your dataset?

There are 1599 wines in the dataset with 12 features:
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3)
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)
12 - quality (score between 0 and 10)

The first 11 attributes are numerical and the last one (Quality) is categorical.

Observations:
* Most quality are 5-6.
* Quality of 8 have high valueos of alcohol.
* The median of alcohol is around 10%.
* Almost all wines have pH between 3-4.
* The density of wine is a little lower than water.

What is/are the main feature(s) of interest in your dataset?

The main features in the data set are pH, alcohol and quality. Although surely all the attributes are representative in the flavor of the wine. Wine is a very complex product that involves many factors that derive in its quality.

What other features in the dataset do you think will help support your
The density, acidity and sugar/chlorides likely contribute to the quality of

the wine. I think extreme values of the flavours break the balance of flavors on the palate.

Did you create any new variables from existing variables in the dataset?

I created the following additional variables:
1. Relative Volatile Acidity respect Fixed Acidity.
2. Relative Citric Acid respect Fixed Acidity.
3. Relative Free Sulfur Dioxide respect Total Sulfur Dioxide.
4. Quality Ranges of Quality to Low, Medium and High values.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?

Most distribution are normal or right skewed.
Acid citric distribution appears bimodal with two peaks around 0 and 0.49.

In addition to including additional variables, it has not been necessary to perform any additional operations.

Bivariate Plots Section

We can found the following correlations:
1. Quality and alcohol (0.476)
2. Quality and volatile acidity (-0.391)
3. sulphates and citric acid (0.313)
4. ph and fixed acidity (-0.683)
5. ph and citrix acid (-0.542)

Other relationships that I want to check are:
1. Quality and relative sulfure
2. Quality and citric acid
3. Quality and fixed acidity
4. Quality and pH
5. Quality with residual sugar
6. Quality and chlorides

1. Relative Volatile Acidity

## 
##  Pearson's product-moment correlation
## 
## data:  fixed.acidity and volatile.acidity
## t = -10.589, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.3013681 -0.2097433
## sample estimates:
##        cor 
## -0.2561309

We can observe a slight tendency to reduce the volatile acidity by increasing the fixed.

2. Relative Citric Acid

## 
##  Pearson's product-moment correlation
## 
## data:  fixed.acidity and citric.acid
## t = 36.234, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6438839 0.6977493
## sample estimates:
##       cor 
## 0.6717034

We can observe a tendency to increse the fixed acidity by increasing the citric acid.

We can summary respect the acidity that the citric acid increase the fixed acidity and an increase in this reduces the volatility.

3. Relative Sulfur Dioxide

## 
##  Pearson's product-moment correlation
## 
## data:  total.sulfur.dioxide and free.sulfur.dioxide
## t = 35.84, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6395786 0.6939740
## sample estimates:
##       cor 
## 0.6676665

We can observe a tendency to increse the free sulfur dioxide by increasing the total sulfur dioxide. The increase dims as the total increases.

4. Sulphates and Citric Acid

## 
##  Pearson's product-moment correlation
## 
## data:  sulphates and citric.acid
## t = 13.159, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2678558 0.3563278
## sample estimates:
##     cor 
## 0.31277

It is observed that there is a slight relationship between the increase of sulphates and the increase of citric acid.

5. pH and Fixed Acidity

## 
##  Pearson's product-moment correlation
## 
## data:  pH and fixed.acidity
## t = -37.366, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.7082857 -0.6559174
## sample estimates:
##        cor 
## -0.6829782

As expected, increasing the acidity decreases the pH to more acid.

6. pH and Citrix Acid

## 
##  Pearson's product-moment correlation
## 
## data:  pH and citric.acid
## t = -25.767, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5756337 -0.5063336
## sample estimates:
##        cor 
## -0.5419041

As expected, as in the previous case, increasing citric acid decreases the pH to more acid.

7. Quality and pH

## 
##  Pearson's product-moment correlation
## 
## data:  quality and pH
## t = -2.3109, df = 1597, p-value = 0.02096
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.106451268 -0.008734972
## sample estimates:
##         cor 
## -0.05773139

The quality and the pH are not related (-0.06) although it can be observed in the boxplot that the average of the pH of the wines of more quality slightly more acid than in the rest.

8. Quality and Volatile Acidity

## 
##  Pearson's product-moment correlation
## 
## data:  quality and volatile.acidity
## t = -16.954, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4313210 -0.3482032
## sample estimates:
##        cor 
## -0.3905578

As observed with the correlation value (-0.39) the volatile acid under improves the quality of the wine.

9. Quality and Alcohol

## 
##  Pearson's product-moment correlation
## 
## data:  quality and alcohol
## t = 21.639, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4373540 0.5132081
## sample estimates:
##       cor 
## 0.4761663

10. Quality and Fixed Acidity

## 
##  Pearson's product-moment correlation
## 
## data:  quality and fixed.acidity
## t = 4.996, df = 1597, p-value = 6.496e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.07548957 0.17202667
## sample estimates:
##       cor 
## 0.1240516

The fixed acidity positively influences the quality of the wine, unlike the volatilite acid that affects in an inverse way.

Maybe it’s because the acidity in the mouth is positive but not in the nose.

11. Quality and Residual Sugar

## 
##  Pearson's product-moment correlation
## 
## data:  quality and residual.sugar
## t = 0.5488, df = 1597, p-value = 0.5832
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.03531327  0.06271056
## sample estimates:
##        cor 
## 0.01373164

From what we observe in the graphs in the correlation, there is no relation between quality and sugar.

12. Quality and Chlorides

## 
##  Pearson's product-moment correlation
## 
## data:  quality and chlorides
## t = -5.1948, df = 1597, p-value = 2.313e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.17681041 -0.08039344
## sample estimates:
##        cor 
## -0.1289066

From what we observe in the graphs and in the correlation there is a slight tendency to improve the quality of the wine with the decrease in chlorides

12. Alcohol and Density

## 
##  Pearson's product-moment correlation
## 
## data:  alcohol and density
## t = -22.838, df = 1597, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5322547 -0.4583061
## sample estimates:
##        cor 
## -0.4961798

There is a clear relationship between alcohol and density. Which was expected since the alcohol is less dense than water and therefore its increase reduces the density of the wine.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
It is observed that the most valued wines are those with more alcohol,

more acids (low pH and high fixed acidity) and with low volatile acidity. It is likely that everything that can bring flavor to the wine improves the value of it but without exaggerating the smells of it. There are a strong relationship between quality and alcohol (0.48). Surely this is because wines with more maturity tend to have more quality. And the more time in maturation, the longer it takes to ferment the sugar in alcohol.
It would have been very interesting if we had in the data set information about the maturation time of the wine.

Did you observe any interesting relationships between the other features
At the beginning of the analysis, he expected a clear, directly proportional

relationship between volatile and fixed acid, citric acid and fixed acid, free and total SO2, and citric acid and sulfates. In all cases, this was the case except in the volatile and fixed rate in which volatility decreases with the increase in fixed acidity.

What was the strongest relationship you found?

The strongest relationship found is between alcohol and density (-0.5) followed by alcohol and quality (0.48).

Multivariate Plots Section

1. Acids relationships

As already anticipated, fixed acidity is inversely related to volatile acidity but directly to citric acid.

2. SO2 relationships

There is no relationship between sulphates and sulphides, although there is a relationship between free and total sulfur, as previously noted.

3. Quality positive factors

The tendency between acidity and pH in any type of quality is clear, and the presence of higher concentrations of alcohol in high quality wines is clear.

4. Quality negative factors

The tendency to improve wine quality by decreasing volatile acidity and chlorides is also clear.

5. Linear Model

## 
## Calls:
## m1: lm(formula = quality ~ alcohol, data = wqr)
## m2: lm(formula = quality ~ alcohol + pH, data = wqr)
## m3: lm(formula = quality ~ alcohol + pH + fixed.acidity, data = wqr)
## m4: lm(formula = quality ~ alcohol + pH + fixed.acidity + volatile.acidity, 
##     data = wqr)
## m5: lm(formula = quality ~ alcohol + pH + fixed.acidity + volatile.acidity + 
##     chlorides, data = wqr)
## 
## ==========================================================================================
##                          m1            m2            m3            m4            m5       
## ------------------------------------------------------------------------------------------
##   (Intercept)           1.875***      4.426***      3.132***      3.588***      3.925***  
##                        (0.175)       (0.387)       (0.598)       (0.571)       (0.603)    
##   alcohol               0.361***      0.386***      0.381***      0.328***      0.324***  
##                        (0.017)       (0.017)       (0.017)       (0.017)       (0.017)    
##   pH                                 -0.850***     -0.541***     -0.264        -0.333*    
##                                      (0.116)       (0.159)       (0.153)       (0.158)    
##   fixed.acidity                                     0.039**       0.021         0.018     
##                                                    (0.014)       (0.013)       (0.013)    
##   volatile.acidity                                               -1.262***     -1.248***  
##                                                                  (0.100)       (0.100)    
##   chlorides                                                                    -0.652     
##                                                                                (0.375)    
## ------------------------------------------------------------------------------------------
##   R-squared             0.227         0.252         0.256         0.324         0.325     
##   adj. R-squared        0.226         0.251         0.254         0.322         0.323     
##   sigma                 0.710         0.699         0.697         0.665         0.665     
##   F                   468.267       268.888       182.731       190.776       153.418     
##   p                     0.000         0.000         0.000         0.000         0.000     
##   Log-likelihood    -1721.057     -1694.466     -1690.443     -1613.880     -1612.366     
##   Deviance            805.870       779.508       775.596       704.768       703.435     
##   AIC                3448.114      3396.931      3390.886      3239.760      3238.733     
##   BIC                3464.245      3418.440      3417.772      3272.022      3276.373     
##   N                  1599          1599          1599          1599          1599         
## ==========================================================================================

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
There is clearly a low presence of citric acid in the samples of low quality

and a tendency to low volatile acidity and fixed acidity when increasing the quality of the wine.

There is no relationship between sulphates and sulphides, although there is a relationship between free and total sulfur, as previously noted.

The tendency to improve wine quality by decreasing volatile acidity and chlorides is also clear.

Were there any interesting or surprising interactions between features?

In general I am surprised by the not very strong relationship between the attributes of wine and its evaluation. In some cases, it does not seem to influence the residual sugar or SO2.

OPTIONAL: Did you create any models with your dataset? Discuss the
Yes, bute the linear model do not have the enought precision (0.325) because the

relationships between the subjective qualite value are not strongs with the attributes of the wines.


Final Plots and Summary

Plot One

Description One

The Fixed acidity is inversely related to volatile acidity but directly to citric acid. There is clearly a low presence of citric acid in the samples of low quality and a tendency to low volatile acidity and fixed acidity when increasing the quality of the wine.

Plot Two

Description Two

There are a strong relationship between quality and alcohol (0.48). Surely this is because wines with more maturity tend to have more quality. And the more time in maturation, the longer it takes to ferment the sugar in alcohol.
It would have been very interesting if we had in the data set information about the maturation time of the wine.

Plot Three

Description Three

The most clear relationship in this dataset is between alcohol and density (-0.5). Which was expected since the alcohol is less dense than water and therefore its increase reduces the density of the wine.


Reflection

In conclusion we can say that the factor that contributes most to the quality of the wine (subjective evaluation of the jury) is the time of maturation of the wine. The time is not among the attributes of the dataset but it is deduced by the amount of wine alcohol that is the result of the maturation process which requires a lot of time.
Other factors that positively influence the quality ofthe wine are all the nuances in the taste and smell that come from the balance of acids found in it.
The acid aroma in the aroma of the wine decreases the value of it as observed in the study.

The biggest difficulty I have found in this dataset is to find clear relationships between the attributes. A quality wine is full of flavors, nuances, textures and smells. And each of these characteristics will be reflected differently in the chemical attributes of the wine.
It is clear that it is complicated to be an enologist!

In part, the not very strong relationship between attributes and valuation have led to the fact that the linear prediction model is obviously not accurate.

It would have been very interesting to have data such as the maturity of the wine, the type of process it has carried, the type of grape, type of barrel that has been used, etc. The winemaking process is very complex and any factor can vary the final result.